Discriminative Lexicon Adaptation for Improved Character Accuracy - A New Direction in Chinese Language Modeling

نویسندگان

  • Yi-Cheng Pan
  • Lin-Shan Lee
  • Sadaoki Furui
چکیده

While OOV is always a problem for most languages in ASR, in the Chinese case the problem can be avoided by utilizing character n-grams and moderate performances can be obtained. However, character ngram has its own limitation and proper addition of new words can increase the ASR performance. Here we propose a discriminative lexicon adaptation approach for improved character accuracy, which not only adds new words but also deletes some words from the current lexicon. Different from other lexicon adaptation approaches, we consider the acoustic features and make our lexicon adaptation criterion consistent with that in the decoding process. The proposed approach not only improves the ASR character accuracy but also significantly enhances the performance of a characterbased spoken document retrieval system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexicon adaptation with reduced character error (LARCE) - a new direction in Chinese language modeling

Good language modeling relies on good predefined lexicons. For Chinese, since there are no text word boundaries and the concept of “word” is not very well defined, constructing good lexicons is difficult. In this paper, we propose lexicon adaptation with reduced character error (LARCE), which learns new word tokens based on the criterion of reduced adaptation corpus error rate. In this approach...

متن کامل

Language modeling of Chinese personal names based on character units for continuous Chinese speech recognition

In this paper, we analyze Chinese personal names to model their statistical phonotactic characteristics for continuous Chinese speech recognition. The analysis showed languagespecific characteristics of Chinese personal names and strongly suggested the advantage of character-unit oriented modeling. A hierarchical language model was composed by reflecting statistical phonotactic characteristics ...

متن کامل

Improved Chinese broadcast news transcription by language modeling with temporally consistent training corpora and iterative phrase extraction

In this paper an iterative Chinese new phrase extraction method based on the intra-phrase association and context variation statistics is proposed. A Chinese language model enhancement framework including lexicon expansion is then developed. Extensive experiments for Chinese broadcast news transcription were then performed to explore the achievable improvements with respect to the degree of tem...

متن کامل

Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark

Recent deep learning based methods have achieved the state-of-the-art performance for handwritten Chinese character recognition (HCCR) by learning discriminative representations directly from raw data. Nevertheless, we believe that the long-and-well investigated domain-specific knowledge should still help to boost the performance of HCCR. By integrating the traditional normalization-cooperated ...

متن کامل

Lexicon Optimization for Chinese Language Modeling

In this paper, we present an approach to lexicon optimization for Chinese language modeling. The method is an iterative procedure consisting of two phases, namely lexicon generation and lexicon pruning. In the first phase, we extract appropriate new words from a very large training corpus using statistical approaches. In the second phase, we prune the lexicon to a preset memory limitation using...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009